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LINEAR MODELS BASED ON NOISY DATA 
AND THE FRISCH SCHEME 

LIPENG NING*, TRYPHON T. GEORGIOUt, ALLEN TANNENBAUM*, AND 

STEPHEN P. BOYD§ 

Abstract. We address the problem of identifying linear relations among variables based on noisy 

measurements. This is, of course, a central question in problems involving "Big Data." Often a key 

assumption is that measurement errors in each variable are independent. This precise formulation 

has its roots in the work of Charles Spearman in 1904 and of Ragnar Frisch in the 1930's. Various 

topics such as errors-in-variables, factor analysis, and instrumental variables, all refer to alternative 

formulations of the problem of how to account for the anticipated way that noise enters in the 

data. In the present paper we begin by describing the basic theory and provide alternative modern 

$^ ' proofs to some key results. We then go on to consider certain generalizations of the theory as well 

^1^' applying certain novel numerical techniques to the problem. A central role is played by the Frisch- 

^^ ^ Kalman dictum which aims at a noise contribution that allows a maximal set of simultaneous linear 

^ , relations among the noise-free variables -a rank minimization problem. In the years since Frisch's 

^^ ■ original formulation, there have been several insights including trace minimization as a convenient 

heuristic to replace rank minimization. We discuss convex relaxations and certificates guaranteeing 

global optimality. A complementary point of view to the Frisch-Kalman dictum is introduced in 

which models lead to a min-max quadratic estimation error for the error-free variables. Points of 

^^ , contact between the two formalisms are discussed and various alternative regularization schemes are 

^ _. , indicated. 

C/3 

Q ■ 1. Introduction. The standard paradigm in modeling is to postulate that mea- 

sured quantities contain a contribution of "accidental deviation" |41j from the other- 
wise "uniformities" that characterize an underlying law. Therefore, a key issue when 
^ \ identifying dependencies between variables is how to account for the contribution of 

f~^ ■ noise in the data. Various assumptions on the structure of noise and of the possible 

r \ dependencies lead to a number of corresponding methodologies. 

The purpose of the present paper is to consider from a modern computational 
point of view, the important situation where the noise components are assumed in- 
"^ ' dependent, and the consequences of this assumption -the data is typically abstracted 

into a corresponding (estimated) covariance statistic. This independence assumption 
underlies the errors-in-variables model [TTl [25] and factor analysis [31 [521 [THl [HI [HZ] , 
and has a century-old history [16l [35l [27]; see also [22l[23l[3Tl[4l[12l[40l[2j[T5]. 
Accordingly, given the large classical literature on this problem, this paper will also 
have a tutorial flavor. 
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H ' The precise formulation has its roots in the work of Ragnar Frisch in the 1930's. 

■ - - The central assumption is that the noise components are independent of the under- 



lying variables and are also mutually independent |22[ 123) . In addition, since several 
alternative linear relations are typically consistent with the data, a maximal set of 
simultaneous dependencies is sought as a means to limit uncertainty and to provide 
canonical models |22| I23j . This particular dictum gives rise to a (non-convex) rank- 
minimization problem. Thus, it is somewhat surprising that the special case where 



*L. Ning is with the Dept. of Electrical & Comp. Eng., University of Minnesota, Minneapolis, 
Minnesota 55455, iiingx015@uiiin.edu 

tT. T. Georgiou is with the Dept. of Electrical & Comp. Eng., University of Minnesota, Min- 
neapolis, Minnesota 55455, tryphonOumn.edu 

■^A. Tannenbaum is with the Comprehensive Cancer Center and Dept. of Electrical & Comp. 
Eng., University of Alabama, Birmingham, AL 35294, tannenba9uab.edu 

''S. P. Boyd is with the Department of Electrical Engineering, Stanford University, Stanford, CA 
94305, boydastanford.edu 



the maximal num.ber of possible simultaneous linear relations is equal to 1 can be ex- 
plicitly characterized -this was accomplished over half a century ago by Reiers0l |3S] ; 
see also [321 US] • To date no other case is known that admits a precise closed- form 
solution. 

In recent years, emphasis has been shifting from hard, non-convex optimization 
to convex regularizations, which in addition scale nicely with the size of the problem. 
Following this trend we revisit the Frisch problem from several alternative angles. We 
first present an overview of the literature, and present several new insights and proofs. 
In the process, we also give an extension of Reiers0l's result to complex matrices. 
Our main interest is in exploring recently studied convex optimization problems that 
approximate rank minimization by use of suitable surrogates. In particular, we study 
iterative schemes for treating the general Frisch problem and focus on certificates that 
guarantee optimality. In parallel, we consider a viewpoint that serves as an alternative 
to the Frisch problem where now, instead of a maximal number of simultaneous linear 
relations, we seek a uniformly optimal estimator for the unobserved data under the 
independence assumption of the Frisch scheme. The optimal estimator is obtained 
as a solution to a min-max optimization problem. Rank-regularized and min-max 
alternatives are discussed and an example is given to highlight the potential and 
limitations of the techniques. 

The remainder of this paper is organized as follows. We first introduce the errors- 
in- variables problem in Section [S] In Section |4l we revisit the Frisch problem, and a 
related problem due to Shapiro, and provide a geometric interpretation of Reiers0rs 
result along with a generalization to complex- valued covariances. In Section [51 we 
present an iterative trace-minimization scheme for solving the Frisch problem and 
provide computable lower-bounds for the minimum-rank. In Section [71 we bring up 
the question of estimation in the context of the Frisch scheme and motivate a suitable 
a rank-regularized min-max optimization problem in Section 18.21 Some concluding 
remarks are provided in Section [TOl 



range space, null space 

orthogonal projection onto X 

positive definite (resp., positive semi-definite) 

= {M I M e M"^", M = M'} 

= {M I M e S„, M > 0} 

= {M I M e C"^", M = Af*} 

= {M I M e H„, M > 0} 

(fc,£)-th entry (resp., k-th entry) 

determinant of M G R"^" 

number of positive eigenvalues 

where [d]i = [M]ii for j = 1, . . . n 

where D is diagonal and [-D]m = [d]i for i = 1, . . .n 

the off-diagonal entries are > (resp. > 0, < 0, < 0), 

or can be made so by changing the signs of selected 

rows and corresponding columns 

3. Data and basic assumptions. Consider a Gaussian vector x taking values 
in M"^^ having zero mean and covariance S. We assume that it represents an additive 
mixture of a Gaussian "noise-free" vector x and a "noise component" x, thus 

x = x-hi. (3.1) 

2 



2. 


Notation. 






7^(•),AA(•) 






Ux 








>0 ( 


>0) 






s„ 








^n,+ 








H„ 








Hn,-|- 








[■hi, 


{[■]k) 






\M\ 








n+i-) 








diag: 


j^nxn _^ ^n . 


M 


^d 


diag* 


. ^n _^ ^nxn 


■.d 


y-^ D 


Af ^^ 


{h, 0, ^^ 0^ 


, ~<^ 


,0) 



The entries of x are assumed independent of one another and independent of the 
entries of x with both vectors having zero mean and covariances E and S, respectively. 
Thus, 

£{xx.') =: E is diagonal (3.2a) 

f (xx') = 0. (3.2b) 

Throughout £{■) denotes the expectation operation and denotes the zero vector/matrix 
of appropriate size. The noise-free entries of x are assumed to satisfy a set of q simul- 
taneous linear relations. Hence, M'x — 0, with M e M"^^ and n > rank(M) — q > 0. 
The problem is mainly to infer these relations. Equivalently, £(xx') =: E has 

rank(E) = n — q (3.2c) 

and EM = 0. Statistics are typically estimated from observation records. To this 
end, consider a sequence 

xt eM"''\ t= 1,...,T 

of independent measurements (realizations) of x and, likewise, let Xt and Xt represent 
the corresponding values of the noise- free variable and noise components. Denote by 



X = [xi 0:2 ... Xt] & 



jjnxT 



the matrix of observations of x and similarly denote by X and X the corresponding 
matrices of the noise- free and noise entries, respectively. Data for identifying relations 
among the noise-free variables are typically limited to the observation matrix X and, 
neglecting a scaling factor of 1/T, the data is typically abstracted in the form of a 
sample covariance XX' . For the most part we will assume that sample covariances 
are accurate approximations of true covariances, and hence the modeling assumptions 
amount to 

XX' ~ diagonal (3.3a) 

XX' ~ (3.3b) 

rank(X) = n — q (3.3c) 

since M'X = 0. 

The number of possible linear relations among the noise free variables and the 
corresponding coefficient matrix need to be determined from either X or E. This 
motivates the Frisch and Shapiro problems discussed in Section 21 An alternative 
set of problems can be motivated by the need to determine X from X via suitable 
decomposition 

X = X + X (3.4) 

in a way that is consistent with the existence of a set of q linear relations. We will 
return to this in Section [8l 

4. The problems of Frisch and Shapiro. We begin with the Frisch problem 
concerning the decomposition of a covariance matrix E that is consistent with the 
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assumptions in Section[31 The fact that, in practice, E is an empirical sample covari- 
ance motivates relaxing (j3.2aj|3.2cl) in various ways. In particular, relaxation of the 
constraint S > leads to the Shapiro problem. 

Problem 1 (The Frisch problem). Given S G S„ _|_, determine 



inr4.(S) :— min{rank(S) | S = S + E, 
S, E > 0, E is diagonal}. 



(4.1) 



Problem 2 (The Shapiro problem). Given E G S„^+, determine 

mr(E) := min{rank(E) | E = E + E, 
E > 0, E is diagonal}. 



(4.2) 



The Frisch problem was studied by several researchers, see e.g., [53J |3TJ 231 H5] 
and the references therein. On the other hand, Shapiro [37 introduced the above 
relaxed version, removing the requirement that E > 0, in an attempt to gain under- 
standing of the algebraic constraints imposed by the off-diagonal elements of E on 
the decomposition. We refer to mr+(-) as the Frisch minimum rank and mr(-) as 
the Shapiro minimum rank. The former is lower semicontinuous whereas the latter 
is not, as stated next. This difference is crucial if one wants to apply this type of 
methodology to real data, namely some sort of continuity is necessary. 

Proposition 1. mr+(-) is lower semicontinuous whereas mr(-) is not. 

Proof: Assume that for a given E > there exists a sequence Ei, E2, ... of 
positive definite matrices such that Ei -^ E while 



mr+(Ei) < mr+(E) 



for ah i = l,2, 



Decompose E^ = E^ -|- Di with rank(Ei) < r, E^ > Di > and Di diagonal. Then 



there exist convergent subsequences E^ 



E and D^^ -t> £*, as fc — >■ cx). Since 



Ejj. — > E + D = E, by the lower semicontinuity of the rank, 

rank(E) < lini inf rank(Eij.) < r = mr+(E). 

This is a contradiction. On the other hand, to see that mr(-) is not lower semicontin- 
uous consider 
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for e > 0. Clearly mr(E) = 2. Also lim^^o ^e = S. Yet E^ = E, -I- D^ while Ee has 
rank 1 and Dg is diagonal (^ 0). Hence mr(Ee) = 1. D 

Assuming that the off-diagonal entries of E > of size n x n are known with 
absolute certainty, any "minimum rank" (mr+(-) and mr(-)) is bounded below by the 
so-called Lederman bound, i.e.. 



2n+l- V8n+1 



< mr(E) < mr4-(E), 
4 



(4.3) 



which holds on a generic set of positive definite matrices S, that is, on a (Zariski 
open) subset of positive definite matrices. Equivalently, the set of matrices S for 
which mr(I]) is lower than the Lederman bound is non-generic -their entries satisfy 
algebraic equations which fail under small perturbation. To see this, consider any 
factorization 

E = FF', 

with F G M"^'". There are {n — r)r+-^ — - independent entries in F (when accounting 
for the action of a unitary transformation of F on the right), whereas the value of the 
off-diagonal entries of S impose "2" constraints. Thus, the number of independent 
entries in F exceeds the number of constraints when {n — r)^ > n + r which then 
leads to the inequality '^"+^-^'^"+ 1 < r. The bound was first noted in ^ while the 
independence of the constraints has been detailed in [?] . In general, the computation 
of the exact value for mr-(_(E) and mr(E) is a non-trivial matter. Thus, it is rather 
surprising that an exact analytic result is available for both, in the special case when 
r = n — 1. We review this next in the form of two theorems. 

Theorem 2 {Reiers0l's theorem f3M)- Let S € S„,+ and E > 0, then 

mr+(I]) =n-l-^ S^^ ^^ 0. 

Theorem 3 {Shapiro's theorem J38f) . Let S G Sn.+ and irreducible, 

mr(E) = n - 1 ^ S ^^ 0. 



The characterization of covariance matrices E for which mr+(E) — n—1 was first 
recognized by T. C. Koopmans in 1937 [27] and proven by Reiers0l [35] who used the 
Perron-Frobenius theory to improve on Koopmans' analysis. Later on, R. E. Kalman 
streamlined and completed the steps in [22j relying again on the Perron-Frobenius 
theorem (see also Klepper and Leamer [26] for a detailed analysis). Our treatment 
below takes a slightly different angle and provides some geometric insight by pointing 
as a key reason that the maximal number of vectors at an obtuse angle from one 
another can exceed the dimension of the ambient space by at most one (Corollary |4]). 
We provide new proofs where we also utilize a dual formulation with an analogous 
decomposition of the inverse covariance. 

4.1. A geometric insight. We begin with two basic lemmas for irreducible 
matrices in M G S„^_|_. Recall that a matrix is reducible if by permutation of rows 
and columns can be brought into a block diagonal form, otherwise it is irreducible. 

Lemma 4.1. Let M > and irreducible. Then, 

M <^Q^ M^^ y^ 0. (4.4) 



Lemma 4.2. Let M > and irreducible. Then, 

M <^Q^ nullity (Af) < 1. (4.5) 



Proof: It is easy to verify that for matrices of size 2x2, (|4.4p holds true. Assume 
that the statement also holds true for matrices of size up to fc x A;, for a certain value 
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of /c, and consider a matrix M of size (fc 
Partition 



1) X (fc + 1) with Af > and M <^ 0. 



A b 
b' c 



so that c is a scalar and, hence, A is of size fc x fc. Partitioning conformably. 



ivr 



F g 
g' h 



where 



F= {A~bc-^b')-\ g = -A-^bh, and h = (c-b'A-^b)-^ > 0. 



For the case where A is irreducible, because A has size fc x fc and A ^^ 0, invoking 
our hypothesis we conclude that A^^ >~^ 0. Now, since b has only non-positive entries 






__^ and A ^^ 0, then 
has positive entries by 



and b ^ 0, g — —A^^bh has positive entries. Since —be 
A - bc~^b' <^ is also irreducible. Thus F = {A- bcr^b' 
hypothesis. 

For the case where A is reducible, permutation of columns and rows brings A 
into a block-diagonal form with irreducible blocks. Thus, A~^ is also block diagonal 
matrix with each block entry-wise positive. Because M is irreducible, b must have at 
least one non-zero entry corresponding to the rows of each diagonal blocks of A. Then 
A — bc^^b' is irreducible and ^^ 0. Also A~^b has all of its entries negative. Therefore 



F= {A-bc'^b') 



and 



-A ^bh have positive entries. Therefore M ^ >-_, 0. D 



Proof: Rearrange rows and columns and partition 



M 



A 
B' 



so that A is nonsingular and of maximal size, equal to the rank of M . Then 

C = B'A-^B. (4.6) 

We first show that B'A~^B >^ 0. Assume that A is irreducible. Then A~^ )~^ 0. 
At the same time B has negative entries and not all zero (since M is irreducible). In 
this case, B'A~^B >^ 0. If on the other hand A is reducible. Lemma ITT] applied to the 
(irreducible) blocks of A implies that A~^ >^ 0. Therefore, in this case, B'A^^B ^^ 0. 

Returning to (|4?6l) and in view of the fact that C di, while B'A^^B >^ 
we conclude that, either C is a scalar (and hence there are no off-diagonal negative 
entries), or both C and B'A~^B are diagonal. The latter contradicts the assumption 
that M is irreducible. Hence, the nullity of M can be at most 1. D 

Lemma 14.21 provides the following geometric insight, stated as a corollary. 

Corollary 4. In any Euclidean space of dimension n, there can be at most n + 1 
vectors forming an obtuse angle with one another. 

Proof: The Grammian M = [v'f.vej^'^^-^ of a selection {vk \ k = I,. . . ,n + q} of 
such vectors has off-diagonal entries which are negative. Hence, by Lemma [4.21 the 
nullity of M cannot exceed 1 . D 

The necessity part of Theorem [3] is also a direct corollary of Lemma [ 

Corollary 5. Let E S S„,+ and irreducible. Then 

S ^^ 0^mr(i;) = n- 1. 



Proof: Let S = E + E. with E diagonal and E > 0. E is irreducible since E is 
irreducible. From Lemma 14.21 the nullity of E is at most 1. Thus mr(E) = n — 1. D 
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4.2. A dual decomposition. The matrix inversion lemma provides a corre- 
spondence between an additive decomposition of a positive-definite matrix and a de- 
composition of its inverse, albeit with a different sign in one of the summands. This 
is stated next. 

Lemma 4.3. Let 



Y. = D + FF' 
with Y.,D e S„,+ , with J:,D>Q andF e M"''^ Then 

S -.^^-^ =E- GG' 



(4.7) 



(4.8) 



for E = D-i and G = D-^F{I + F'D-^F)-^/^. Conversely, if g^ holds with 
G e R"'''', then so does g^j for D = E-^ and F = E-^G{I - G' E-^G)-^''^ . 

Proof: This follows from the identity (/ ± MM')''^ = / =F M{I =F M'M)-^M'. D 
Application of the lemma suggests the following variation to Frisch's problem. 
Problem 3 ( The dual Frisch problem) . Given a positive- definite nxn symmetric 
matrix S determine the dual minimum rank; 

mrduai('5) := min{rank(S' \ S — E — S, 
S,E >0, E is diagonal}. 



Clearly, if S* = E^^ = ^ - GG' (as in dMl), then ^ > 0. Furthermore, a 
decomposition of S always gives rise to a decomposition E = D + FF' (as in (|4.7p ) 
with the terms FF' and GG' having the same rank. Thus, it is clear that 



mr+(I]) < nirduai(S ^), 



(4.9) 



and that the above holds with equality when an optimal choice of I? = E in (|4.1I) is 
invertible. However, if D is allowed to be singular, the rank of the summands FF' 
and GG' may not agree. This is can be seen using the following example. Take 
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It is clear that E admits a decomposition E = E + S, in correspondence with (|4.7p . 
where E = D = diag{l, 1, 0} while E — FF' as well as i^' = [1, 1, 1] are of rank one. 
On the other hand. 



5 = E-i = 



1 





-1' 
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-1 
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-1 


3 



Taking E ~ diagjei, 62, 63} in 

GG' = E-S = 



, it is evident that the rank of 
ei - 1 





62-1 
1 



1 

1 

63-3 



cannot be less than 2 without violating the non-negativity assumption for the sum- 
mand GG' . The minimal rank for the factor G is 2 and is attained by taking 
ei = 62 = 2 and 63 = 5. 
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On the other hand, in general, if we perturb E to E + e/ and, accordingly, D to 
D + el, then 

mrduai((S + £/)-!) < mr+(E), Ve > 0. (4.10) 

Equality in ()4.10p holds for sufficiently small value of e. Thus, nir-|_ and mrduai are 
closely related. However, it should be noted that mrduai(0 fails to be lower semi- 
continuous since a small perturbation of the off-diagonal entries can reduce mrduai (•)■ 
Yet, interestingly, an exact characterization of the mrduai('S') — n — 1 can be obtained 
which is analogous to those for mr+ and mr being equal to n — 1; the condition for 
mrduai will be used to prove the Reiers0l and Shapiro theorems. 
Theorem 6. For S S S„_+, with S > and irreducible, 

mrduai(^) =n-l^Sh^O. (4.11) 



Proof: If 5 ^^ and E is diagonal satisfying ^ > S* > 0, then E-S = GG' ^^ 0. 
By invoking Lemma 14.21 we deduce that if i? — S" is singular, rank(G) = n — 1. Hence, 
mrduai (5*) = n- 1. 

To establish that mrduai('5') = n — 1 ^ S ^^ 0, we assume that the condition 
S >^ fails and show that mrduai ("S) < n — 1. We first argue the case for a 3 x 3 
matrix 5* = [sij]^ j^i- Provided 5 ^^ we can assume that it has strictly negative 
off-diagonal entries (which can be done by reflecting the signs of rows and columns). 
We now let 



S.k 



for i e {1,2,3} and {i, j, k) being permutations of (1, 2, 3). These are all positive. Let 
S = diag*(ei, 62, 63). It can be seen that S — S >0 while rank(^ — S) — 1. To verify 
the latter observe that S — S = vv' for 

v' = [Vei - sii, ^62 - S22, V^s - S33] . 

This establishes the reverse implication for matrices of size 3x3. 

We now assume that the statement holds true for matrices of size up to (n — 1) x 
(n — 1) for some n > 4 and use induction. So let S*, S be of size n x n with S* ^^ 
and S diagonal. We need to prove that mrduai (5') < n— 1. We partition 



5* = 



A b 
b' c 



, s = 



E 
e 



with A, E being (n — 1) x [n — 1). For any S such that S* — 5 > 0, e cannot be equal 
to c, otherwise 6 = and S is reducible. Further, 5 — 5* > if and only if e > c and 

M ■.= E-{A + b{e - c)-^b') > 0. 

The nullity of 5 — S* coincides with that of M. To prove our claim, it suffices to show 
that Ae :— A + b{e ~ c)^^b' ^^ 0, or that A^ is reducible for some e > c. (Since, in 
either case, by our hypothesis, the nullity of M for a suitable E exceeds 1.) 

We now consider two possible cases where S ^^ fails. First, we consider the 
case where already A ^^ 0. Then obviously A^ ^^ for e — c sufficiently large. The 
second possibility is 5 ^^ while A >^ 0. But if A is (transformed into) element-wise 



nonnegative, then bb' must have at least one pair of negative off-diagonal entries. 
Then, consider A^ = A + Xbb' for A = (e — c)~^ e (0,oo). Evidently, for certain values 
of A entries of Ag change sign. If a whole row becomes zero for a particular value of 
A, then A^ is reducible. In all other cases, there are values of A for which Ae ^^ 0. 
This completes the proof. D 

4.3. Proof of Reiers0l's theorem (Theorem[2]). We first show that E~^ >~^ 
implies mr_|_(S) = n — 1. From the continuity of the inverse, (S + e/)^^ '^^ for 
sufficiently small e > 0. Applying Theorem [6l we conclude that 

mrduai((S + eiy^) = n - I. 

Since mr+(E) > mrduai((S + e/)^^) as in (I4.10|) . we conclude that mr+(E) = n — 1. 

To prove that mr_|_(I]) = n— 1 => S^^ y^ 0, we show that assuming S~^ ^_. and 
mr+(I]) = 71 — 1 together leads to a contradiction. From the continuity of the inverse 
and the lower semicontinuity of mr-|_(-) (Proposition [1]), there exists a symmetric 
matrix A and an e > such that 

(S + eA)"i ^^ 0, and mr+(E + eA) = n - 1. 

Then, from Theorem[6l mrduai((S + eA)^^) < n — 1 while from (|4.9p 

mr+(S + eA) < mrduai((S + eA)-^). 

Thus, we have a contradiction and therefore S^^ '^^ 0. D 

4.4. Proof of Shapiro's theorem (Theorem [S]). Given E > consider A > 
such that A/-E > 0, a diagonal £), and let E := XI -D. Since S-Z? = £;-(A/-S), 

mr(S)=mrduai(A/-S). (4.12) 

If E is irreducible and S ^^ 0, then A/ — E is irreducible and A/ — S ^^ 0. It follows 
(Theorem [H]) that mrduai(A/ — S) ~ n — 1, and therefore mr(E) = n — 1 as well. 

For the the reverse direction, if mr(I]) = n — 1 then mrduai(-^-^ — S) = n — 1, which 
implies that A/ — E ^^ and therefore that E ^^ 0. D 

The original proof in [55] claims that for any E > of size n x n with ti > 3 and 
E -^^ 0, there exists a (ti — 1) x (rt — 1) principle minor that is -^^ 0. This statement 
fails for the following sign pattern 

■+ 
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+ 





+ 





+. 



This matrix can not transformed to have all nonpositive off-diagonal entries, yet all 
its 3 X 3 principle minors ^^ 0. 

4.5. Parametrization of solutions under Reiers0l's and Shapiro's condi- 
tions. For either the Frisch or the Shapiro problem, a solution is not unique in general. 
The parametrization of solutions to the Frisch problem when mr_|-(E) ~ n — 1 has 
been known and is briefly explained below (without proof). Interestingly, an analogous 
parametrization is possible for Shapiro's problem and this is given in Proposition [S] 
that follows, and both are presented here for completeness of the exposition. 

Proposition 7. Let E e S„^+ with E > and E~^ >~^ 0. The following hold: 



i) For D > diagonal with T, — D > and singular, there is a probability vector 

p (p has entries > that sum up to \) such that (S — D)Yi^^ p = 0. 
ii) For any probability vector p, 



D — diag* 



[p\. 



p-V]. 



satisfies Ti — D >Q and Y, — D is singular. 
Proof: See glUS]. D 

Thus, solutions of Frisch's problem under Reiers0rs conditions are in bijective 
correspondence with probability vectors. A very similar result holds true for Shapiro's 
problem. 

Proposition 8. Let E e S„^_|_ be irreducible and have < off-diagonal entries. 
The following hold: 

i) For D diagonal with E — Z? > and singular, there is a strictly positive vector 

V such that (E — D)v ~ 0. 
ii) For any strictly positive vector v € M"^-'^, 



D = diag* 



r 1 ' ^ -"-' 



(4.13) 



satisfies that E — 13 > and Yi — D is singular. 
Proof: To prove (i), we note that if (E — D)v = 0, then v >~^ 0. To see this 
consider (E — D + e/)~^ for e > 0. From Lemma HTTl 

(Y-D + el)-^ y^ 

and since v is an eigenvector corresponding to its largest eigenvalue, a power iteration 
argument concludes that v y^ 0. 

To prove ii), it is easy to verify that the diagonal matrix D in (|4.13p for v )~^ 
satisfies (E — D)v — 0. We only need to prove that E — _D > 0. Without loss of 
generality we assume that all the entries of v are equal. (This can always be done by 
scaling the entries of v and scaling accordingly rows and columns of E.) Since w is a 
null vector of E — 13 and since M :— Y, — D has < off-diagonal entries 

[MU=J2\[MU. 

Gersgorin Circle Theorem (e.g., see [l^) now states that every eigenvalue of M lies 
within at least one of the closed discs < Disk ( \M\ a , J2jj=i I [-^1 1 j I ) i * = 1 1 ■ • ■ ; '^ f ■ No 
disc intersects the negative real line. Therefore E — £> > 0. D 

4.6. Decomposition of complex-valued matrices. Complex-valued covari- 
ance matrices are commonly used in radar and antenna arrays |42j . The rank of 
Ti — D, for noise covariance D as in the Frisch problem, is an indication of the number 
of (dominant) scatterers in the scattering field. If this is of the same order as the 
number of array elements (e.g., n — 1), any conclusion about their location may be 
suspect. Thus, it is natural to seek conditions for mr+(E) — n—1 analogous to those 
given by Reiers0l, for the case of complex covariances, as a possible warning. This we 
do next. 
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Consider complex-valued observation vectors Xt 
^/^ and yt,zt G M"^-'-, and set 



yt + izt, t = 1,...T, where 



X = [xi, ...xt] = Y + 'iZ 
with y = [2/1 , ... yx\ , Z — [zi, ... zt] ■ The (scaled) sample covariance is 

E = XX* = S, + iSi e H„,+, 

where the real part E^ := YY' + ZZ' is symmetric, the imaginary part Ej := ZY' — 
Y Z' is anti-symmetric, and "*" denotes complex-conjugate transpose. As before, we 
consider a decomposition 

E = E + Z? 

with E > singular and 2? > diagonal. We refer to [1] |8] for the special case where 
mr_|_(E) = 1. In this section we present a sufficient condition for a Reiers0l-case where 
mr+(E) = 71 — 1. 

Before we proceed we note that re-casting the problem in terms of the real- valued 



R 



Er Ei 

E( E, 



eS 



2ri,+ 



does not allow taking advantage of earlier results. The structure of R with antisym- 
metric off-diagonal blocks implies that if [a', 5']' is a null vector then so is [—6', a']' 
(since, accordingly, a~\-\h and ia — 6 are both null vectors of E). Thus, in general, 
the nullity of R is not 1 and the theorem of Reiers0l is not applicable. Further, the 
corresponding noise covariance is diagonal with repeated blocks. 

The following lemmas for the complex case echo Lemma 14.11 and Lemma 14.21 
Lemma 4.4. Let M e H„_+ he irreducible. If the argument of each non-zero 
off-diagonal entry of —M is in {—-§;, ^) , then each entry of M~^ has argument in 



Assume that the 
1) matrix M that 



A 


b 


b* 


c 


F 


9 


9* 


h 



V 2 ~^ 2" ' 2 2"7' 

Proof: It is easy to verify the lemma for 2 x 2 matrices, 
statement holds for sizes up to n x n and consider an {n-\-l) x (n- 
satisfies the conditions of the lemma. Partition 



M = 
with A is of size n x n, and conformably. 



By assumption non-zero entries of —A and —6 have their argument in (— ^n+i , 2"+i )■ 
Then, by bounding the possible contribution of the respective terms, it follows that 
for the argument of each of the entries oi —A + bc^^b* is in (— ^ , ^) . Then, the ar- 
gument of each entry of F = (A-bc^^b*)^^ is in (-| + ^, f - ^); this follows by 
assumption since F is nx n. Clearly, (— ^ + ^, f — ^) C (— f + 2^) f ~ 2^)- 
Regarding g, by bounding the possible contribution of respective terms, we similarly 
conclude that the argument of each of its non-zero entries is in (— -^ 



2"+i ' 2 



2-.+ 1, 



Lemma 4.5. Let M e H,, 
off-diagonal entry of —M is in 



be irreducible. If the argument of each non-zero 
§:,§:), then rank(Af) > n - 1. 
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Proof: First rearrange rows and columns of A/, and partition as 

M 



A B 
B* C 



so that A is nonsingular and of size equal to the rank of M, which we denote by r. 
Then 

C = B*A-^B (4.14) 

and has size equal to the nullity of M . We now compare the argument of the off- 
diagonal entries of C and B*A~^B, and show they cannot be equal unless C is a 
scalar. Since the off-diagonal entries of —A have their argument in (— ^, ^) C 
(—5^, jp) , the off-diagonal entries of A'-^ have their argument in (— ^ + ^, ^ — ^) 
from Lemma [44l Now, the {k,£) entry of B* A^^B is 

[B*A-^B]m - Y.^B*]k,[A-%[B],, 

and the phase of each summand is 

eiTg{[B*]k^[A~%[By) e 



TT n TT TT TT 



Jul i]<^j \ 2 z*" 2""-'-' 2 2^ 2"^-'-/ 
Thus, the non-zero off-diagonal entries of B* A^^B have positive real part while 

arg{-[C]k,) e (-|^, 1^) . 

Hence, either the off-diagonal entries of B* A~^B and C are zero, in which case these 
are diagonal matrices and M must be reducible, or B*A~^B and C are both scalars. 
This concludes the proof. D 

Theorem 9. Let S S H„^+ be irreducible. If the argument of each non-zero 
off-diagonal entry of —T, is in (— jrr, 5^); then mr(S) = n — 1. 

Proof: The matrix E — Z? is irreducible since D is diagonal. If E — £> > and 
singular, and since the argument of each non-zero off-diagonal entry of — (E — D) is 
in (—5^, ^), Lemma [4 . 5 1 applies and gives that rank(E — D) — n — 1. D 

Clearly, since mr+(E) > mr(E), under the condition of Theorem [HI mr+(S) = 
n — 1. It is also clear that for S € H„,4. irreducible with all non-zero off-diagonal 
entries having argument in (— ^^r, ^rr), we also conclude that mrduai(*S') = n— 1. 

5. Trace minimization heuristics. The rank of a matrix is a non-convex func- 
tion of its elements and the problem to find the matrix of minimal rank within a given 
set is a difficult one, in general. Therefore, certain heuristics have been developed over 
the years to obtain approximate solutions. In particular, in the context of factor anal- 
ysis, trace minimization has been pursued as a suitable heuristic [301 1371 138] thereby 
relaxing the Frisch problem into 

min trace(E — D), 

D:'S>D>0 

for a diagonal matrix D; with a relaxation of I? > corresponding to Shapiro's 
problem. The theoretical basis for using the trace and, more generally, the nuclear 
norm for non-symmetric matrices, as a surrogate for the rank was provided by Fazel 
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etal. jl3] who proved that these constitute convex envelops of the rank function on 
bounded sets of matrices. 

The relation between minimuni trace factor analysis and minimum rank factor 
analysis goes back to Ledermann in JH) (see [S] and [351 ). Herein we only refer to 
two propositions which characterize minimizers for the two problems, Frisch's and 
Shapiro's, respectively. 

Proposition 10 (0). Let T, ^ t,i + Di > for a diagonal Di > 0. Then, 

{ti,Di) = argmin{trace(S) \Y.^t + D>Q, S>0, diagonal D > 0} (5.1a) 
^ 3 Ai > : EiAi = and { \^^f ^ 1' % \nf "^ n' 

Proposition 11 ([36]). Let T, = Y.2 + D-z > for a diagonal D2. Then, 

(Sa, D2) = argmin{trace(S) \J:^t + D >Q, S>0, diagonal D} (5.1b) 

^ 3 A2 > : S2A2 = and [Aajn = 1 Vi. 



Evidently, when the solutions to these two problems differ and Di ^ D2, then there 
exists k & {1, . . . , n} such that 



[Dalfcfc < and [Di]kk = 0. 



Further, the essence of Proposition [TT] is that a singular E originates from such a 
minimization problem if and only if there is a correlation matrix in its null space. 
The matrices Ai and A2 appear as Lagrange multipliers in the respective problems. 

Factor analysis is closely related to low-rank matrix completion as well as to sparse 
and low-rank decomposition problems. Typically, low-rank matrix completion asks for 
a matrix X which satisfies a linear constraint A{X) = b and has low/minimal rank 
{A{-) denotes a linear map A : R"^" — ^ MP). Thus, factor analysis corresponds to 
the special case where A{-) maps X onto its off-diagonal entries. In a recent work 
by Recht etal. [M] , the nuclear norm of X was considered as a convex relaxation of 
rank(A') for such problems and a sufficient condition for exact recovery was provided. 
However, this sufficient condition amounts to the requirement that the null space 
of .4(-) contains no matrix of low-rank. Therefore, since in factor analysis diagonal 
matrices are in fact contained in the null space of .4(-) and include matrices of low- 
rank, the condition in |34) does not apply directly. Other works on low-rank matrix 
completion (see, e.g., [34l|6]) mainly focus on assessing the probability of exact re- 
covery and on constructing efficient computational algorithms for large-scale low-rank 
completion problems [24\ |25) . On the other hand, since diagonal matrices are sparse 
(most of their entries are zero), the work on matrix decomposition into sparse and 
low-rank components by Chandrasekaran etal. [7J is very pertinent. In this, the £1 
and nuclear norms were used as surrogates for sparsity and rank, respectively, and a 
sufficient condition for exact recovery was provided which captures a certain "rank- 
sparsity incoherence" ; an analogous but stronger sufficient "incoherence" condition 
which applies to problem (j5.1bp is given in |36j . 

5.1. Weighted minimum trace factor analysis. Both mr(I]) and mr4.(I]) in 

(|4.1[) and (J4.2I) . respectively, remain invariant under scaling of rows and the corre- 
sponding columns of E by the same coefficients. On the other hand, the minimizers 
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in (j5.1a[) and (j5.1b[) and their respective ranks are not invariant under scaling. This 
fact motivates weighted-trace minimization, 

min |trace(W^E) | E == E + D, E > 0, diagonal D>o\ , (5.2) 

given S > and a diagonal weight W^ > 0. As before the characterization of mini- 
mizers relates to a suitable condition for the corresponding Lagrange multipliers: 

Proposition 12 (|3H]). Let Y, ^ to + Dq > for a diagonal matrix Do>0 and 
consider a diagonal M^ > 0. Then, 

(So,£'o) = argmin{trace(T4^S) |S = i; + D>0, S>0, diagonal D>0} (5.3) 

[Ao]„: = [W]u, if [Doh > 0, 



<=> 3 An > : SAn — and >, r . , ^ r^m -r n^ ^ « 

A corresponding sufficient and necessary condition for (£,£)) to be a minimizer 
in Shapiro's problem is that there exists a Grammian in the null space of E whose 
diagonal entries are equal to the diagonal entries of W. 

Minimum-rank solutions may be recovered as solutions to (j5.3p using suitable 
choices of weight. However, these choices depend on E and are not known in advance - 
this motivates a selection of certain canonical E-dependent weight as well as iteratively 
improving the choice of weight. One should note that since D is diagonal, letting W 
be a not-necessarily diagonal matrix does not change the problem -only the diagonal 
entries of W determine the minimizer. 

We first consider taking W = E~^. A rationale for this choice is that the minimal 
value in (|5.2|) bounds mr+(E) from below, since for any decomposition E = E + £), 

rank(E) = trace(S'*E) 

> trace((E + i:i)"iE) 

= trace(E"iE) (5.4) 

where " denotes the Moore-Penrose pseudo inverse. Continuing with this line of 
analysis 

rank(E) = trace(E»E) 

> trace((E + e/)"iE) (5.5) 

for any e > 0, suggests the iterative re-weighting process 

£'(fc+i) := arg min trace ((E - D(j.) -I- e/)"^(E - D)) (5.6) 

for fc = 1, 2, . . . and i?(o) := 0. In fact, as pointed out in jT3], (15. 6p corresponds to 
minimizing logdet(E — D + el) by local linearization. 

Next we provide a sufficient condition for E to be such a stationary point (|5.6p . 
i.e., for E to satisfy 

arg min trace ('(S + e/)"i(E - D)) = 0. (5.7) 
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The notation o used below denotes the element-wise product between vectors or ma- 
trices which is also known as Schur product [20j and, likewise, for vectors a,b ^ M"^^, 
aob(z R"''^ with [a o b]i = [a]i[b],. 

Proposition 13. Let E e S„+ and let the columns of U form a basis ofTZ{T,). 
If 

7^(c/oc/)c7^(^_^(^)0^^(j,)), (5.8) 

then E satisfies (j5.7p for all e G (0, ei) and some ei > 0. 

We first need the following result which generalizes [Ml Theorem 3.1]. 

Lemma 5.1. For A e M"^p and B e M"^'' having columns ai,...,ap and 
bi, . . . ,bq, respectively, we let 

C = [ai o bi, ai o b2, ■ ■ ■ , 02 o bi . . . Op o bq] £ R"^^'', 
0: M" ^R" d^ dmg{AA' i^i&g*{d)BB'), and 
iP : RP""^ ^ M" A^ diag(AAB'). 

Then n{<l>) = 7^(V') = n{{AA') o {BB')) = 7^(C). 

Proof: Since dia.g{A A' diag* {d)BB') = {{AA') o [BB'))d, it follows that 

n{(i))^n{{AA')o{BB'). 

Moreover, diag(^AS') = YJLi ELi «» ° ^j[A]y , and then 7^(7/') = 7^(C). We only 
need to show that 7^(C) = 7^((AA0 o {BB')). This follows from 

{AA)o{BB')^ j2j2{ci^a[)o{b,br) 

Thus 7^(c) = 7^((AA') ° (55'))- Q 

Proof: I Proof of Proposition\13[l Assume that E satisfies ()5.7p . If rank(E) = r, 
let E = USU' be the eigendecomposition of E with S = diag*(s) with s S R"". Let 
the columns of V be an orthogonal basis of the null space of E, i.e., n^,gj — VV . 
Then 

(E + el)-' = (E + en^(j,) + eU^^^^)-' = (E + eH^^^,)" + ^n^^^^, 

and 



ar! 



g min trace f(E + e/)"\E-D) 

D:t>D ^ 

arg niin trace ( ( e(E + eU^^^^)'^ + n^^^gj j (E - Z?) 

From Proposition [T2l (|5.7I) holds if there is M e Sr.+ such that 

diag(yMr) = diag (e(E + en^(s))« + n^^j^^) . (5.9) 
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Obviously, if e = M = / satisfies the above equation. We consider the matrix M of 
the form M = / + A. For (|5.9p holds, we need diag((S + eXlT^)'*) to be in the range 
of ip for 

ip-.Sn^W A^ diag(V^AT/')- 
From Lemma [5. II that Tl{ip) = 7?.(^J^,j,^ o Hj^,.^-^). On the other hand, since 



e(S + en^(j.))«-L/diag 



Wi 



U', 



then diag(e(S + ell^/gx)'*) G TZ{U o U). So if (|5.8p holds, there is always a A such 
that M = / + A satisfies (|5.9p . Morover, it is also required that / + A > 0. Since 
the map from e to A is continuous, for small enough e, i.e. in a interval (0,ei) the 
condition / + A can always be satisfied. D 

We note that (|5.8p is a sufficient condition for E to be a stationary point of ()5.7p 
in both Frisch's and Shapiro's settings. 

6. Certificates of minimum rank. We are interested in obtaining bounds on 
the minimal rank for the Frisch problem so as to ensure optimality when candidate 
solutions are obtained by the earlier optimization approach in (15. 6p . 

The following two bounds were proposed in [44] , and follow from Theorem [2l 
However, both of these bounds require exhaustive search which may be prohibitively 
expensive when n is large. 

Corollary 14. Let S G S„^ and T, > 0. If there is an si x si principle minor 
of S whose inverse is positive, then 

mr+(i;)>si-l. (6.1a) 

If there is an S2 x S2 principle minor of Y,^^ which is element-wise positive, then 

mr+(I]) > S2 - 1- (6.1b) 

Next we discuss three other bounds that are computationally more tractable - 
the first two were proposed by Guttman [18]. Guttman's bounds are based on a 
conservative assessment for the admissible range of each of the diagonal entries of 
D = T.-t. 

Proposition 15. Let S e S„.+ and let 

i?i:=diag*(diag(S)) 
i?2:=(diag*(diag(S-i)))"'. 



Then the following hold. 



mr+(S) >n+(S-£ii) (6.1c) 

mi+{Y.)>n+{Y.-D2). (6. Id) 



Further, n+{j: - Di) < n+(E - D2). 

Proof: The proof follows from the fact that I] > D implies D < D2 < -Di . See 
[15] for details. D 

16 



It is also easy to see that mr(E) > n+(S — Di) which provides a lower bound for 
the minimum rank in Shapiro's problem. Next we return to a bound, which we noted 
earlier in (|5.4I) . 

Proposition 16. Let S e S„,+ . Then the following holds: 

mr+rE) > min tracefS^VS - £>)). (6.1e) 

S>D>0 



Proof: The statement follows readily from (j5.4|) . D 

Evidently an analogous statement holds for mr(I]). We note that (|6.1cp and 
(j6.1dp remain invariant under scaling of rows and corresponding columns, whereas 
(j6.1ep does not, hence these two cannot be compared directly. 

7. Correspondence between decompositions. We now return to the decom- 
position of the data matrix X = X+X as in p. 41) and its relation to the corresponding 
sample covariances. The decomposition of X into "noise-free" and "noisy" compo- 
nents implies a corresponding decomposition for the sample covariance, but in the 
converse direction, a decomposition E = S -f- E leads to a family of compatible de- 
compositions for X, which corresponds to the boundary of a matrix-ball. This is 
discussed next. 

Proposition 17. Let X e K"^'^, and E := XX'. If 

T. = t + t (7.1) 

with S, E symmetric and non-negative definite, there exists a decomposition 

X = X + X (7.2a) 

for which 

XX' = 0, (7.2b) 

E = XX', (7.2c) 

E = XX'. (7.2d) 



Further, all pairs {X, X) that satisfy \7.2a\ 7. 2d) are of the form 



X = tY.-^X + R^/^V, X = EE^^X - R^'^V, (7.3) 

with 

i?:=E-EE"iE (7.4a) 

= E-EE^^E (7.4b) 

= EE^^E 
= EE^^E, 

and V e R"""^ such that W = I, XV' = 0. 

Proof: The proof relies on a standard lemma (p^ Theorem 2]) which states 
that if A e R"^"^, B G R"^'" with m<T such that AA' = BB' , then A = BU for 
some U e R™^"^ with UU' = /. Thus, we let A := X, 

„ _ [E 0" 

''~ [o eJ ' 
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and B := [/ /] 5^/^, where S^^^ is the matrix-square root of S. It follows that there 
exists a matrix U as above for which A = BU , and therefore we can take 



:= S'^^U. 



This establishes the existence of the decomposition ()7.2ap . 

In order to parameterize all such pairs {X, X), let Uo be an orthogonal (square) 
matrix such that 

Then XUo and XUo must be of the form 

XUo =: [Xi A] , XUo =■■ [X, -A] , 
with Xi, Xi square matrices. Since 



(7.5) 



[X' X'] 



E 
S 



then 



XiX[ + AA' ^ E 
XiX[ - AA' = 
XiX[ + AA' = E. 



(7.6a) 
(7.6b) 
(7.6c) 



Substituting XiX[ for A A' into (I7.6a|) and using the fact that Xi = Xi — Xi with 
Xi = S^/^ we obtain that 

Xi = SS"^/^. 

Similarly, using (|7.6cp instead, we obtain that 

Xi = ss^i/^. 

Substituting into (|7.6bp . (|7.6ap and (|7.6cp we obtain the following three relations 

AA' = EE^^E 

= E-EE^^E 
= E-EE"^E. 

Since AA' and the E's are all symmetric, 

AA' = EE^^S 

as well. Thus, A = R^/^Vi with ViVl — I. The proof is completed by substituting 
the expressions for Xi and A into (|7.5p . D 
Interestingly, 



rank(_R) + rank(E) = rank 



E E 
E E 



= rank 
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E 
E 



= rank(E) + rank(E), 



and hence, the rank of the "uncertainty radius" R of the corresponding X and X- 
matrix spheres is 

rank(i?) = rank(E) + rank(E) - rank(i;). 

In cases where identifying X from the data matrix X, different criteria may be used 
to quantify uncertainty. One such is the rank of R while another is its trace, which is 
the variance of estimation error in determining X. This topic is considered next and 
its relation to the Frisch decomposition highlighted. 

8. Uncertainty and worst-case estimation. The basic premise of the decom- 
position (j7.ip is that, in principle, no probabilistic description of the data is needed. 
Thus, under the assumptions of Proposition 1171 R represents a deterministic radius 
of uncertainty in interpreting the data. On the other hand, when data and noise 
are probabilistic in nature and represent samples of jointly Gaussian random vectors 
X, X, X as in (|3. 11 - 13. 2a| ). the conditional expectation of x given x is £'{x|x} = SS^^x, 
while the variance of the error 

E{{x ~ EE"ix)(i - SS^^x)'} = S - EE^^S 

^R 

is the radius of the deterministic uncertainty set. Either way, it is of interest to assess 
how this radius depends on the decomposition of E. 

8.1. Uniformly optimal decomposition. Since the decomposition of E in the 
Frisch problem is not unique, it is natural to seek a uniformly optimal choice of the 
estimate Kx for x over all admissible decompositions. To this end, we denote the 
mean-squared-error loss function 

L{K, E, E) := trace (£ ((x - Kx){i<i - Kx)')) 

= trace (± - K± - tK' + K{t + T.)K'\ , (8.1) 

and define 

5(E) := {(E, E) : E = E + E, E, E > and E is diagonal} 

as the set of all admissible pairs. Thus, a uniformly-optimal decomposition of X into 
signal plus noise relates to the following min-max problem: 

niin max L{K,t,t). (8.2) 

-f (s,s)e5(s) 

The minimizer of ()8.2p is the uniformly optimal estimator gain K. Analogous min- 
max problems, over different uncertainty sets, have been studied in the literature |12] . 
In our setting 

min max L(iir, E, E) > max ininL(ii', E, E) (8.3a) 

K (E,E)e5(S) (S,S)e5(S) K 

= max trace (e - EE^^e) (8.3b) 

= max trace (e - EE^^e) . (8.3c) 
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The functions to maximize in (j8.3b[) and (j8.3cp are both strictly concave in E and S. 
Therefore the maximizer is unique. Thus, we denote 

(iCoptjSoptjSopt) := arg max mini(is:, S, S), (8.4) 

(S,S)e5(E) K 

where, clearly, Kopt = SoptS^^ 

In general, the decomposition suggested by the uniformly optimal estimation 
problem does not lead to a singular signal covariance E. The condition for when that 
happens is given next. Interestingly, this is expressed in terms of half the candidate 
noise covariance utilized in obtaining one of the Guttman bounds (Proposition lisp . 

Proposition 18. Let E > 0, and let 

i?o:=icliag*(diag(E-i))"' (8.5) 

(which is equal to \D2 defined in Provosition \15\). If '^ — Dq > 0, then 

Sopt = -Do and Eopt = E - Do. (8.6a) 

Otherwise, 

^opt !i D'o and Eopt is singular. (8.6b) 

Proof: From (jOcl) . 

L{Kopt, Eopt, Sopt) = max |e - EE^^E | E > E > 0, E is diagonal! 

< max |e - EE^^E | E is diagonalj (8.7) 

= -tracc(i:'o) 

with the maximum attained for E = Dq. Then (j8.6ap follows. In order to prove 
(|8.6bp . consider the Lagrangian corresponding to (j8.3cp 

£(E, Ao, Ai) = trace(E - EE^^E + Ao(E - E) + AiE) 

where Ag, Ai are Lagrange multipliers. The optimal values satisfy 



, n. 



[/ - 2E-iEopt - Ao + Ai]kk = 0, V fc = 1, . 

AoEopt - 0, Ao > 0, (8.8b) 

AiEopt =0, Ai > and is diagonal. (8.8c) 

If E — Dq ^ we show that Eopt is singular. Assume the contrary, i.e., that Eopt > 0. 
From (la!8b)) . we see that Ao = 0, while from (jgTSal) . [/ - 2E-iEopt]fefc < 0. This gives 
that 

Popt]fefc > „r^_n = [-Dolfefe, 

^[^ \kk 

for all fc = 1, . . . ,ri, which contradicts the fact that E — Dq ^ 0. Therefore Eopt is 
singular. We now assume that E ^ Dq. Then there exists k such that [Eopt]fefe > 
[Do]kk- From (I8.8cl) and (|8.8ap . we have that 

[Ai]fcfc = and [/ - 2E-iEopt]fcfc > 
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which contradicts the assumption that [Sopt]fcfc > [-Do]a;A;- Therefore Eopt < ^o and 
(j8.6b[) has been estabhshed. D 
We remark that while 



£ ((x - Xx)(i - Kn)') = t-Kt- ±K' + KUK' 



E-SS^^E 



is matrix-convex in K and a unique minimum for K = EE ^, the error covariance 
E — EE~^E may not have a unique maximum in the positive semi-definite sense. To 



see this, consider E 



2 1 
1 2 

Jopt 



. In this case Dq 



jl, Sopt 



5/4 1 
1 5/4 



EoptE Eopt 



3/8 3/16 
3/16 3/8 



and 



(8.9) 



On the other hand, for E — 



3/2 1 
1 3/2 



E-EE"1E = 



then 

1/3 1/12 
1/12 1/3 

in the sense of semi-definiteness. This 



which is neither larger nor smaller than 
is a key reason for considering scalar loss functions of the error covariance as in (j8.ip . 
Next we note that there is no gap between the min-max and max-min values in 
the two sides of (j8.3a|) . 



Proposition 19. For E e S, 



then 



min max L(iir, E,E)= max minL(iir, E, E). 
^ (S,s)e5(s) (s,s)e5(s) ^ 



(8.10) 



Proof: We observe that for a fixed K , the function L{K^ E, E) is a linear 
function of (E,E). For fixed (E,E), the function is a convex function of K. Under 
this conditions it is standard that (I8.10p holds, see e.g. [3 page 281]. D 

We remark that when Dq = i diag* (diag(E^^)) is admissible as noise co- 
variance, i.e., E — Do > 0, the optimal signal covariance is Eopt = E — Dq, and 
the gain matrix Kopt — EoptE^^ — I — DoT,~^ has all diagonal entries equal to ^ 
Thus, with Kopt in (j8.ip the mean-square-error loss is independent of E and equal to 
trace (^Kopt^K'^p^) for any admissible decomposition of E. 

We also remark that the key condition (Proposition [T8| 

E>idiag*(diag(E-i))^' 

<^2diag*(diag(E-i)) > E^^ 

can be equivalently written as E^^ o (2/ — 11') > 0, and interestingly, amounts 
to the positive semi-definitess of a matrix formed by changing the signs of all off- 
diagonal entries of E~^. The set of all such matrices, {5* | S" > 0, S* o (2/ — 11') > 0}, 
is convex, invariant under scaling rows and corresponding columns, and contains the 
set of diagonally dominant matrices {5* | 5 > 0, [5*]^ > J2j^i \['^]ij\ foi' ^^ *}• 
We conclude this section by noting that trace(i?opt), with 



i?o 



pt 



■^opt 



EoptE Eopt, 
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quantifies the distance between admissible decompositions of E. Tliis is stated next. 
Proposition 20. For S > and any pair (S, S) e 5(S), 

trace ((S - Sopt)S"'(S - Sopt)') < trace(i?opt). 

Proof: Clearly < trace(S - SS-^E), while from Proposition [H 

L(Xopt, S, E) = trace(S - 2EoptS"'S + EoptS^^E^pJ (8.11) 

< trace (i?opt)- 

Thus, trace(EE-iE - 2i]optS-iE + EoptE-^E^pt) < trace(i?opt)- □ 

8.2. Uniformly optimal estimation and trace regularization. A decom- 
position of E in accordance with the min-max estimation problem of the previous 
section often produces an invertible signal covariance E. On the other hand, it is 
often the case and it is the premise of factor analysis, that E is singular of low rank 
and, thereby, allows identifying linear relations in the data. In this section we consider 
combining the mean-square-error loss function with regularization term promoting a 
low rank for the signal covariance E 13 . More specifically, we consider 

J = min max f i(i^, E, E) - A • trace(E)V (8.12) 

'< (S,s)e5(s) ^ ^ 

for A > 0, and properties of its solutions. 

As noted in Proposition 1191 fsee [3 page 281]), here too there is no gap between 
the min-max and the max-min, which becomes 

max min L{K, E, E) — A • tracc(E) 
(s,s)e5(S) ^ 



= max min trace (1 - A)E - JCE - EiiT' + K{J: + Y.)K' 

(S,S)e5(S) K \ 

= max trace f(l-A)E-E(E + E)"^E) (8.13a) 

= max trace (-AE + (1 + A)E - E(E + E)"^e) . (8.13b) 

Since (|8.13ap and (I8.13b[) are strictly concave functions of E and E, respectively, there 
is a unique set of optimal values (i^A.opt, ^A^opt, S^^opt)- 

Proposition 21. Let E > 0, Do = ^ (diag* diag(E^^)) , X^m be the smallest 
— 1—1 
eigenvalue of Dq ^EDq ^ , and (iiTA^opt, EA,opt, EA,opt) as above, for A > 0. For any 

A > Amin - 1, ^A.opt is singular. 

Proof: The trace of (— AE-|-(1 + A)E — EE^^E) is maximal for the diagonal choice 
E = (1 + X)Do. For any A > Amin — 1, E — (1 + A)I?o fails to be positive semidefinite. 
Thus, the constraint E — E > in (j8.13bl) is active and SA,opt is singular. D 

Note that E — 2Dq ^ (unless E is diagonal), and therefore Amin < 2. Hence, for 
A > 1, EA.opt is singular. When A ^- we recover the solution in (18. 4p . whereas for 
A — > oo we recover the solution in Proposition [TU] 
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9. Accounting for statistical errors. From an applications standpoint E rep- 
resents an empirical covariance, estimated on the basis of a finite observation record 
in X. Hence p.3al) and (j3.3b|) are only approximately valid, as already suggested in 
Section |3l Thus, in order to account for sampling errors we can introduce a penalty 
for the size of C := XX\ conditioned so that 

T. = t + t + C + C', 

and a penalty for the distance of E from the set {D \ D diagonal}. 

Alternatively, we can use the Wasserstein 2-distance [33l |32] between the respec- 
tive Gaussian probability density functions, which can be written in the form of a 
semidefinite program 



d(S -h D, S) = min ( trace(E + T. + D + Ci + C[) \ 

Ci 



E-t-Z? Ci 
C[ E 



>0 



Returning to the uncertainty radius of Section [7] and the problem discussed in 
Section [8l we note that the problem 



maxminL(X, E, D) = max trace ( E - E(E + D)"^E 



can be expressed as the semidefinite program 

(s - q) I 



max < trace I 
Q 



Q E 
E t + D 



>0 



Thus, putting the above together, a formulation that incorporates the various tradeoffs 
between the dimension of the signal subspace, mean-square-error loss, and statistical 
errors is to maximize 



trace(S - Q) - Ai trace(E) - Aa trace(E + D - Ci - C[) 



(9.1) 



subject to 



Q E 
E ± + D 



>0, 



E + D Ci 
C[ E 



> 0, with D > and diagonal. 



The value of the parameters Ai, A2 dictate the relative importance that we place on 
the various terms and determine the tradeoffs in the problem. 

We conclude with an example to highlight the potential and limitations of the 
techniques. We generate data X in the form 

X ^ FV + X 

where F e M"''^ V e W^, and X e K"^^ with n = 50, r = 10, T = 100. 
The elements of F and V are generated from normal distributions with mean zero 
and unit covariance. The columns of X are generated from a normal distribution 
with mean zero and diagonal covariance, itself having (diagonal) entries which are 
uniformly drawn from interval [1, 10]. The matrix E = XX' is subsequently scaled so 
that trace(E) — 1. We determine 

{T,,Q,D) = arg max < trace(E — Q) — X ■ trace(E) > 
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subject to 



t + D 



> 0, d(i: + D, S) < e, with E, I? > and D diagonal, 



and tabulate below a typical set of values for the rank of E (Table 1) as a function of 
A and e. We observe a "plateau" where the rank stabilizes at 10 over a small range of 
values for e and A. Naturally, such a plateau may be taken as an indication of a suit- 
able range of parameters. Although the current setting where a small perturbation in 
the empirical covariance E is allowed, the bounds for the rank in (j6.1dp and (|6.1e[) are 
still pertinent. In fact, for this example, in 7/10 instances where the rank(E) = 10 
the bound in (I6.1dp (computed based on the perturbed covariance S + D) has been 
tight and it thus a valid certificate. For the same range of parameters, the bound in 
(|6.1ep has been lower than the actual rank of E. In general, the bounds in (j6.1dp and 
(j6.1ep are not comparable as either one may be tighter than the other. 



A ^^ 





0.08 


0.10 


0.12 


0.14 


0.16 


1 


46 


26 


24 


23 


22 


22 


5 


46 


17 


14 


10 


10 


9 


10 


45 


16 


12 


10 


10 


8 


20 


45 


15 


12 


10 


10 


8 


50 


45 


15 


12 


10 


10 


8 


100 


45 


15 


11 


10 


10 


8 



Table 1: rank(E) as a function of A and e 



10. Conclusions. In this paper we considered the general problem of identifying 
linear relations among variables based on noisy measurements -a classical problem of 
major importance in the current era of "Big Data." Novel numerical techniques and 
increasingly powerful computers have made it possible to successfully treat a number 
of key issues in this topic in a unified manner. Thus, the goal of the paper has been to 
present and develop in a unified manner key ideas of the theory of noise-in- variables 
linear modeling. 

More specifically, we considered two different viewpoints for the linear model 
problem under the assumption of independent noise. From an estimation viewpoint, 
we quantify the uncertainty in estimating "noise-free" data based on noise-in- variables 
linear models. We proposed a min-max estimation problem which aims at a uniformly 
optimal estimator -the solution can be obtained using convex optimization. From the 
modeling viewpoint, we also derived several classical results for the Frisch problem 
that asks for the maximum number of simultaneous linear relations. Our results pro- 
vide a geometric insight to the Reiers0l theorem, a generalization to complex-valued 
matrices, an iterative re-weighting trace minimization scheme for obtaining solutions 
of low rank along with a characterization of fixed points, and certain computational 
tractable lower bounds to serve as certificates for identifying the minimum rank. Fi- 
nally, we consider regularized min-max estimation problems which integrate various 
objectives (low-rank, minimal worst-case estimation error) and explain their effective- 
ness in a numerical example. 

In recent years, techniques such as the ones presented in this work are becoming 
increasingly important in subjects where one has very large noisy datasets including 
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medical imaging, genom.ics/proteomics, and finance. It is our hope that the mate- 
rial we presented in this paper will be used in these topics. It must be noted that 
throughout the present work we emphasized independence of noise in individual vari- 
ables. Evidently, more general and versatile structures for the noise statistics can 
be treated in a similar manner, and these may become important when dealing with 
large databases. 

A very important topic for future research is that of dealing with statistical errors 
in estimating empirical statistics. It is common to quantify distances using standard 
matrix norms -as is done in the present paper as well. Alternative distance measures 
such as the Wasserstein distance mentioned in Section [9] and others (see e.g., [32] ) 
may become increasingly important in quantifying statistical uncertainty. 

Finally, we raise the question of the asymptotic performance of certificates such 
as those presented in Section [Sj It is important to know how the tightness of the 
certificate to the minimal rank of linear models relates to the size of the problem. 
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